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Abstract 

We develop a recursion for hidden Markov model of any order h, which allows us 
to obtain the posterior distribution of the latent state at every occasion, given the 
previous h states and the observed data. With respect to the well-known Baum- 
Welch recursions, the proposed recursion has the advantage of being more direct to 
use and, in particular, of not requiring dummy renormalizations to avoid numerical 
problems. We also show how this recursion may be expressed in matrix notation, 
so as to allow for an efficient implementation, and how it may be used to obtain the 
manifest distribution of the observed data and for parameter estimation within the 
Expectation-Maximization algorithm. The approach is illustrated by an application 
to financial data which is focused on the study of the dynamics of the volatility level 
of log-returns. 
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1 Introduction 



Hidden Markov (HM) models have become a popular statistical tool for the analysis of 
data having a time-series structure; for an up-to-date review see Zucchini and MacDonald 
(2009). These models have also found great interest for the analysis of longitudinal data, 
where independent short time series are observed for typically many statistical units; for 
a review see Bartolucci et al. (2010). HM models are based on the assumption that the 
observable random variables, corresponding to the different time occasions, are condition- 
ally independent given an unobservable (or latent) process, which follows a Markov chain. 
Usually, this Markov chain is assumed to be of first order and time homogenous, so that 
the transition probabilities are time invariant. 

A fundamental tool of inference for HM models is represented by forward-backward 
recursions of Baum and Welch (see Baum et al., 1970; Welch, 2003). For a first-order HM 
model, these recursions allow us to compute the manifest probability (or density) of the 
observed sequence of data and to obtain the posterior distribution of every latent state 
and of every pair of consecutive latent states given these data. Through this recursion is 
then possible to implement an Expectation-Maximization (EM) algorithm (Baum et al., 
1970; Dempster et al., 1977) for maximum likelihood estimation of the parameters and to 
perform local decoding (Juang and Rabiner, 1991), that is to find the most likely state 
at every occasion, given the observed data. Despite its popularity, the Baum- Welch re- 
cursions may suffer from numerical problems due to the fact that certain probabilities 
may become negligible. This problem typically requires to implement dummy renormal- 
izations; see Scott (2002) for further comments and Lystig and Hughes (2002) for an 
alternative solution in dealing with the manifest distribution of the observed data. 

In a rather recent paper, Bartolucci and Besag (2002) proposed a probabilistic result 
to obtain the marginal distribution of a random variable in Markov random field model 
and mentioned that this result may be also used for HM models, providing an example 
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for a first-order and a second-order HM model. Developing the intuition of Bartolucci 
and Besag (2002), in this paper we propose a general recursion to deal with HM models 
of any order h. This recursion allows us to obtain the posterior distribution of every 
latent state given the previous h states and the observed data. With respect to the 
Baum- Welch recursions, the proposed recursion has the advantage of being more direct 
to use, especially with higher-order HM models. Moreover, it does not require dummy 
renormalizations. 

We show how the proposed recursion may be used to obtain the manifest distribution of 
the observed data and the required posterior probabilities to implement the EM algorithm 
for parameter estimation. Moreover, the recursion may be directly used for local decoding 
and for prediction. In order to allow for an efficient implementation, we also express the 
proposed result in matrix notation. Such an implementation in the R language is available 
to the reader upon request. 

The remainder of the paper is organized as follows. In the following section we briefly 
review HM models and the Baum- Welch recursion. The proposed recursion is illustrated 
in Section 3, whereas in Section 4 we illustrate its use for maximum likelihood estimation, 
local decoding, and prediction. Finally, in Section 5 we provide an illustration by an 
application based on an HM version of the stochastic volatility (SV) model for financial 
data (Taylor, 2005), in which we assume the existence of discrete levels of volatility. 

2 Preliminaries 

Consider a sequence of T manifest random variables Yi, . . . ,Yt which are collected in 
the vector Y. A hidden Markov (HM) model assumes that these random variables are 
conditional independent given the unobservable random variables Ui, . . . , Ut which follow 
a Markov chain with k states. We consider in particular a Markov chain of order h so 
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that 



p{u t \ui, ut-i) = p(ut\ut- h , ttt-i), t = h + 1, . . . , T, 



where we use the notation p(u t \ui, . . . , u t -i) = P{U t = u t \Ui = u±, . . . , U t -i = u t -i). 
A similar notation will be adopted throughout the paper to denote probability mass 
functions, in a way that will be clear from the context. It is also assumed that every 
Y t depends on the latent process only through U t and then by f(yt\ut) we denote the 
probability mass (or density) function of this distribution. 

The specific HM model adopted in an application is based on assumptions on the above 
transition probabilities, such as that these probabilities are time homogeneous. These 
assumptions may also concern the distribution of each response variable given the corre- 
sponding latent variable. The specific formulation may also involve covariates, if available. 
In this section, however, we remain in the general context described above and base most 
results on the unspecified transition probability function p{ut\u ma ^(t-h,i), ■ ■ ■ , u t -i) and the 
conditional response probability (or density) function f(y t \u t ). Note that, in denoting the 
transition probabilities, we use the index max(t — h, 1) in order to have a notation that 
is suitable even for t < h. Obviously, when t = 1, the conditioning argument in these 
probabilities vanishes and they reduce to initial probabilities of type p(«i). 

The following example clarifies a possible formulation of an HM model for time-series 
data. For other examples in the context of longitudinal data see Bartolucci et al. (2010). 

Example 1 Consider an HM version of the SV model for financial data (Taylor, 2005), 
which is based on the assumption that, given U t , the log-return Y t has a normal distribution 
with mean and variance depending on Ut- In particular, we assume that 



where a v , v = 1, . . . , k, are volatility levels associated to the different latent states. We 
also assume that the underlying Markov chain is of order h and is time-homogenous, so 
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that, for all t > h, we have 

p(u t \u t -h, • • • , Wt-l) = ^u t _ h ,...,u t , 

where n Vu ... lVh+1) v±, . . . , Vh+i = l,...,k, are common transition probabilities to be es- 
timated together with <7i,...,<7fc. Other parameters to be estimated are the initial and 
transition probabilities for t ^ h. These parameters are denoted by 

\«max(t-h,l),-,«t = P( M t| M max(t-M)> ■ ■ ■ i U t-l) ■ 

Overall, taking into account that the initial probabilities are such that ^2 Ul X± jU1 = 1 and 
similar constraints hold for all transition probabilities, the number of free parameters is 



h-l 




It has to be clear that the same modeling framework described above may be adopted 
with longitudinal data in which we observe short sequences of data for n sample units, 
which are usually assumed to be independent. However, we do not explicitly consider the 
case of longitudinal data since the theory that will be developed easily apply to this case 
as well. 

In order to efficiently compute the probability (or the density) of an observed sequence 
of T observations, collected in the vector y = (yi, . . . , z/t), Baum and Welch (Baum et al., 
1970; Welch, 2003) proposed the following forward recursion for a first-order HM model: 

f(ut,y^t) = ^2f(ut-i,y <t -i)p( u t\ut-i)f(yt\u t ), t = 2,...,T, (2) 

Ut-l 

where y^ t = (y±, . . . ,y t ). This recursion is initialized with f(ui,yi) = p{u\) f {y\\u\) and, 
in the end, we obtain the manifest probability (or density) function of y as 

/(w)=x; /(«*.»)■ 
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Moreover, Baum and Welch introduced the backward recursion 

f{y>t\ u t) = ^2f(y>t+i\ u t+i)p(ut+i\ut)f(yt+i\u t +i), t = i,...,t-i, (3) 

ut+i 

where y >t = (yt+i, ■ ■ ■ ,Vt), which is initialized with f{y >T \u t ) = 1. Using this recursion, 
we can obtain the posterior probability of every latent state given the observed data, that 
is q(ut\y) = P{U t = u t \Y = y). In particular, we have 

q{u t y) = J7-, , t = i,...,T, 

whereas for the posterior probability of every pair of consecutive states we have the 
posterior probability 

/ i v f(ut-i,y <t )p(ut\ut-i)f(yt\u t )f(y >t \u t ) 
q{ut-i,u t \y) = 7^ , t = 2,...,T. 

As mentioned above, the Baum- Welch recursions suffer from the problem of numerical 
instability due to the fact that, as t increases, the probability in (2) becomes negligible. 
The problem is evident when T is large and also affects the probabilities in (3). This 
problem requires suitable renormalizations; see Scott (2002) for a more detailed descrip- 
tion. 

3 Proposed recursion 

Developing a result due to Bartolucci and Besag (2002) for Markov random fields, in this 
section we propose how to compute the posterior probabilities 

g(wt|wmax(t-M)> • • • ,ut-i,y), t = l,...,T, (4) 

that is the conditional probability of a certain realization of U t , given U mas _i t -h,i)i ■ ■ ■ > ^t-i 
and a certain configuration of responses collected in the vector y. 

For last time occasion, that is when t = T, the above probability may be simply 
computed as 

/ i \ /(yr|«r)p(«r|MmaxCr-h,i), • • • ,«r-i) ,^ 
q{uT\u max (T-h,i), ut-i, y) = s , (5) 

C[U max (T-h,l), ut-i, Vt) 
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where c(u max ^-h,i), ■ ■ ■ > ut-i, Vt) is the normalizing constant equal to the sum of the 
numerator of (5) for all the possible values of Ut- 

Now consider the following Theorem that allows us to compute the conditional prob- 
ability in (4) for t smaller than T and is related to Theorem 1 of Bartolucci and Besag 
(2002). 

Theorem 1 We have that 

q{ut\um^(t~h,i), ■ ■ .,Ut-i,u t +i, ■ ■ .,u t+j ,y) = 

Q(ut+j+i\ Mmax(i+i+l-h,l)j ■ • • > u t+j, V) 

^ q(u t \u max (t- h>1 ), « t _i, u t+u . . . , u t+j+1 , y) 
u t+j+i 

for t = 1, . . . , T — 1 and j = 0, . . . , min(/i, T — t) — 1 and where the conditioning variables 
u t+ i, . . . , u t+ j at Ihs vanishes for j = 0. 

Proof First of all consider that the assumption that the latent Markov process is of order 
h implies that 

q(v>t+j+i\ ) = g(Ut+i+l|«nu«(t-M)> • • • ' V) 

and then we have 

q{u t \u max ( t -h,l), U t -l, U t +U Ut+j+1, y) P(Ujnax(t-h,l), Ut+j, V) 

Consequently, the sum in (6) is equal to 

PQmax(f-M)' • • • , U t -1, U t +1, ■ ■ .,U t +j,y) 

p(u max{t -h,i), ■ ■ .,u t +j,y) 

and the Theorem holds. □ 

On the basis of the above result, we implement a backward recursion finalized to 
computing the probabilities in (4). As already mentioned, for t — T these probabilities 



(6) 
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may be directly obtained from (5). Then, in reverse order for t = 1, . . . ,T — 1 we first 
compute the posterior probabilities 

g(«t|tW(t-M), • • • ' • • • ' M *+j> 

with j = min(T — t,h). Since f/j is conditionally independent of Yi, . . . , Yj_i, Vt+i, • • • > 
given £/ max (t-/i,i), • • • , Z/t— i, f^+i, • • • , ^t+j, and Ft, we have that the above probability is 
equal to 

f(Vt\Ut) U.LoP( U t+l\ U max(t+l-h,l), ■ ■ ■ , Ut+l-l) ^ 



C(>max(i-M)' • • • > U t-h U t+1, ■ ■ ^Ut+j^t) 

The normalizing constant at the denominator is obtained by summing the numerator for 
all possible values of u t . Then we apply result (6) from j = min(T — t, h) — 1 to j = 0, so as 
to recursively remove the dependence of U t on U t +j+\ and obtaining the target posterior 
probabilities q(u t \u max{t -h,i), • • • , ut-i, y)- 

In order to clarify the above algorithm, we explicit consider below the case of a first- 
order and a second-order HM model. 

Example 2 For a first- order model (h = 1), the algorithm consists of first computing the 
probabilities 



q(ur\uT-i,yT) 



f(y T \uT)p(u T \uT- 



c(ut-i,Vt) 

Then, we for t — 1, . . . , T — 1 we apply the rule in (6) in reverse order. In particular, for 
T ^ 3, we have 

-i -l 

q(u t+l \u t ,y) 



q(u t \u t -i,y) 



E 



Mt+l 



q(ut\u t -i,u t+1 ,y) 



t = 2,...,T-l, 



and 



where 



q(ui\y) 



E 



q{u 2 \ui,y) 



-i -l 



q{ui\u2,yi) 

q(u t \ut-i : ut + i,y t ) 



q(ui\u 2 ,y) 

/(l/l|MlMM2|ttl) 

c(u 2 ,yi) 

f(yt\ut)p(ut\ut-i)p(u t+ i\ut) 
c(u t _i,u t+ i,y t ) 



t = 2,...,T-l. 



Example 3 For a second-order model (h = 2), the algorithm consists of first computing 
the probabilities 

, , v f(yT\UT)p(u T \UT-2,UT-l) 

q(u T \UT-2,UT-l,yT) = } ; • 

C{U T -2,UT-l,yT) 

Then, for t = 1, . . . , T — 1 we apply the rule in (6) for j = 2 (provided that t ^ T — 2) 
and then for j = 1. In particular, assuming that T ^ 4, we first compute 

f(y T -i\uT-i)p(uT-i\u T -3, u t _ 2 )p{ut\ut-2, ut-i) 



g(n r _i|u T _ 3 , n T _ 2 , u T , yr) 
and consequently 

q{uT-\\u T -z,UT-2,y) 



E 



c(u T -3,U T -2,UT,yT) 



q(u T \u T -2,UT-i,y) 

g(u T _i|ti T _3,Uf_ 2 ,M T ,y) 



Then in reverse order for T = 3, . . . , T — 2, we first compute 

f(yt\ut)p(ut\u t -2, Ut-i)p(ut+i\ut-i,Ut) 



q{u t \u t -2, u t -i, u t+ u u t+ 2, yt) 



c(u t -2,U t -l, Ut+l, Ut+2, yt) 

xp{u t +i\u t -i, u t )p(u t+2 \u t , u t+1 ), 



we remove the dependence of U t on U t +2 by computing 

q(u t+2 \u t ,u t+1 ,y) 



q(u t \ut-2,u t -i,u t+1 ,y) 



E 



Ut+2 



q(ut\ik-2, ut-i, u t +i, u t+2 , y) 



-i 



and finally we remove the dependence on Ut+i by computing 



q{ut\ut-2,u t -i,y) 



E 



Mt+l 



q(u t+1 \u t _i,u t ,y) 
q(u t \ut-2,u t -i,ut+i,y) 



In the end, we use similar rules to obtain q(u2\ui, y) and consequently q(u\\y) on the basis 
of 

f(yi\Ul)p(u2\Ul)p(u 3 \u 1 ,U 2 ) 



q(ui\u 2l u 3l yi) 
q(u 2 \ui,u 3 ,u 4l y 2 ) 



c(u 2 ,u 3 ,yx) 

f{y 2 \u 2 )p(u 2 \ui)p(u 3 \ui, U 2 )P(U 4 \U2, U 3 ) 

c(u 2 ,u 3 ,y 2 ) 
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A crucial point is applying these recursions is the efficient implementation. At this 
regard, it is worth noting that for the first-order HM model we can express the recursion 
in matrix notation and then efficiently implement it in languages such as Matlab and 
R. Details on this are provided in Appendix. 

4 Maximum likelihood estimation using the proposed 
recursion 

Given a sequence of observations yi, . . . ,yr collected in y, the model log- likelihood is 

£(6) = logp(y) (8) 

where 6 is vector collecting all model parameters. The structure of depends on the 
specific parametrization which is adopted for the conditional response distribution f(y t \u t ) 
and the initial and transition probabilities p{ut\u max (t-h,i) , ■ ■ ■ , Ut-i)- For instance, for the 
HM-SV model illustrated in Example 1, includes the initial and transition probabilities 
\«max(t-h an d 7l v 1 ,...,v h+1 and the standard deviations a v . We recall that, in this case, 
the probabilities TT vl! ... jVh+1 are common to all t > h, being the underlying Markov chain 
time homogenous. 

In the following, we show how to compute the log-likelihood in (8) and implement 
its maximization by the recursion developed in the previous section. It has to be clear 
that the same algorithm may be used in with longitudinal data, even in the presence of 
individual covariates. 

First of all, for any sequence of latent states u±, . . . , Ut collected in u, we simply have 
that 

/ x = f(u,y) = Ut f(yt\ut)p(u t \u max{t ^ h:1) , Mt_i) 

q( u \y) Ili9( w tl w nMK(t-M)»-"' w *-i»i') 

where /(it, y) refers to the joint distribution of U\, . . . , Ut and Y\, . . . ,Yt and q(u\y) to 
the posterior distribution of Ui, . . . , Ut given Yi, . . . , Yt- Consequently, given an arbitrary 
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sequence u, say that with all states equal to 1, we compute the model log-likelihood as 

ela\ i /(2/tkt)pKl u max(t-ft,l),---,Wt_i) 

^i 6 ) = 1, lo § — n s — 

^ 9(Mt|l*niax(t-M))---) U t-l)2/) 

on the basis of the proposed recursion. Note that, in this way, we do not need to use any 
renormalization, which are instead necessary in the Baum and Welch recursions; see also 
Lystig and Hughes (2002). 

In order to maximize £(d), we can use an Expectation-Maximization algorithm that 
follows the same principle as that illustrated by Baum et al. (1970). In particular, this 
algorithm is based on the complete data log-likelihood 



t u t 

+ Yl ■ ■ ' E ^-(^M)-«< ^PKI^maxfi-Ll), • • • , (9) 

* M max(i-l,/i) u t 

where w t)Ut is a dummy variable equal to 1 if the latent state at occasion t is u t and to 
otherwise and -Zt,u max(t _ h 1) ,...,u t is a corresponding dummy variable for the sequence of 
latent states iimax(t-fc,i) > ■ ■ ■ ,u t , which may be expressed through the product 

^Mmaxti-M) >-> M * = W max(t-/l,l),Vax(t-li,l) ' ' ' W t,Uf 

At the E-step of the EM algorithm, we need to compute the posterior expected values 
of the above dummy variables given the observed data and the current value of the 
parameters. In particular, we have that 

w t>ut = E(w t>ut \y) = q(u t \y), 

^>«m«x(t-/i,i)i-i«t = ^( Z t,u m ^ t ^ hA) ,...,ut\y) = Q{ u max(t-h,l) i ■ ■ ■ > u t\v)- 

In particular, from the proposed recursion, we directly obtain q{u\\y). Then, for t > 1, 
we exploit a trivial forward recursion: 

?(Wmax(t-M)> • • • i u t\v) = 

q(u t \u max ( t - h>1 ), . . . ,u t -i,y)q(u max (t-h,i), ■ ■ .,iH-i\y), t = 2,...,h+l, , . 
q(u t \u t ^ h ,...,u t _ 1 ,y)Y JUt _ h _ 1 < l( u t~h~i,---,Ut-i\y), t = h + 2, . . . , T, 
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to be performed for t = 1, . . . , T. Then, q(ut\y) is computed by a suitable marginalization. 
How to formulate the above forward recursion in matrix notation, so as to efficiently 
implement it, is illustrated in Appendix. 

As usual, the M-step of the EM algorithm consists of maximizing £*(9), once the 
dummy variables in (9) are substituted by the corresponding expected values obtained as 
above. The following example clarify how to implement this step for a specific model. 

Example 4 For the HM-SV model illustrated in Example 1, the parameters o~ v are up- 
dated at the M-step as follows: 



<r v = \ ^ ; , v = l,...,k. 

Moreover, for the initial probabilities we have 

Ai, Ul = w ljUl , ui = 1, . . . , fc, 
and for the transition probabilities, we have 



_ Zf > ti mai(t-/.,l) , — ,Ut 

A ^M m ax(t-fe,l) >•••>"* ~~ ' M max(i-/i,l)' ■ ■ ■ 1 U t ~ 

2— IV Z *. u max(t-h,l)'---'' it i-l'' !; 



for t = 1, . . . , h and 

^it>n -'<>' j \,...,uf l +i , 

^vi,...,Vh+i — ~ ' ' ' • ' V h+1 ~~ 1, . . . , K. 



2~2t>h Zt,vi 
Z~2v Z~2t>h Zt,v,...,v h ,v 



Clearly, the posterior probability obtained by the proposed recursion may also be used 
for local decoding (Juang and Rabiner, 1991), that is to find the most likely value Ut of 
the latent state U t , given the observed data. In particular, u t is found as the value that 
maximize the posterior probability q(ut\y). 

Finally, on the basis of a sequence of h latent states of the type ux-h+i, ■ ■ ■ ,ut, which 
may be even fixed by the local decoding method, it is possible to predict the latent state at 
occasion T+ 1, denoted by Ut+i, as the value which maximizes q(uT+i\uT-h+i, ■ ■ ■ > ut, y)] 
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this posterior probability directly derives from the proposed recursion. We can also predict 
the manifest distribution of Vr+i through the following finite mixture 



5 An application 

In order to illustrate the proposed approach, we fitted the HM version of the stochastic 
volatility model described in Example 1 to the SP500 data for the period from the be- 
ginning of 2008 to the end 2011. The observed outcome is the percentage log-return with 
respect to the previous closing day, so that we have T = 1007 observations. 

For the above data, we estimated the model at issue for different values of h (order 
of the latent Markov chain) and different values of the number of k (number of latent 
sates), by the EM algorithm outlined in the previous section. The aim of this preliminary 
analysis is to check if the assumption that the volatility level follows a first-order process is 
plausible. This means that the level of volatility in a given day only depends on that of the 
previous day. This hypothesis may be compared with that of a higher-order dependence, 
in which the level of volatility in a given day also depends on the volatility of, say, the 
previous two days. 

The results of the preliminary fitting are reported in Table 1 in terms of log-likelihood, 
number of parameters, computed as in (1), and Bayesian Information Criterion (BIC; 
Schwarz, 1978). Note that we also include results for the model with h — 0, which assumes 
independence between the volatility levels corresponding to different time occasions. 

According to BIC, the observed data supports the hypothesis of a first-order depen- 
dence of the stochastic volatility. In fact, the smallest value of the BIC index, among 
those in Table 1, is observed for h — 1 and k = 3. For this model, we report in Table 2 
the estimates of the parameters of main interest. 




UT+l 



13 









k 








h 


1 


2 


3 


4 







-2026.60 


-1898.73 


-1887.46 


-1885.57 


log-lik. 


1 


-2026.60 


-1819.45 


-1778.00 


-1764.06 




2 


-2026.60 


-1807.69 


-1768.97 


-1746.45 







1 


3 


5 


7 


#par 


1 


1 


5 


11 


19 




2 


1 


9 


29 


67 







4060.12 


3818.19 


3809.50 


3819.54 


BIC 


1 


4060.12 


3673.48 


3632.05 


3659.49 




2 


4060.12 


3677.61 


3738.46 


3956.18 



Table 1: Results from the preliminary fitting, in terms of maximum log-likelihood, number 
of parameters, and BIC, of the HM-SV model for different values of h (latent Markov 
chain order) and k (number of latent states). 



V 


a v 


Vl 








v 2 = 1 


v 2 = 2 


v 2 = 3 


1 


0.865 


1 


0.988 


0.010 


0.002 


2 


1.609 


2 


0.013 


0.981 


0.006 


3 


3.770 


3 


0.000 


0.025 


0.975 



Table 2: Estimates of the parameters a v (levels of volatility) and tt v1jV2 (transition proba- 
bilities) under the HM-SV model with h = 1 and k = 3. 



We then observe three distinct levels of stochastic volatility and very high persistence 
in the volatility level, since the probabilities in the transition matrix in Table 2 are very 
close to 1. As a comparison, we report in Table 3 the corresponding parameter estimates 
under the model with h = 2 and k = 3. 

We observe that the estimated levels of volatility under the second-order model are 
similar to those under the first-order model. Moreover, we again note a high persistence, 
in the sense that it VuV2)V3 is very close to 1 whenever vi = v 2 = V3. The estimates of these 
transition probabilities for v\ 7^ v 2 seem to be less reasonable, especially when v\ = 3 and 



11 



V 


CT,, 

V 


V\ 

1 i 






7T 




IH = 1 


= 2 


iin = 3 


1 


0.842 


1 


1 


0.979 


0.021 


0.000 


2 


1.725 


1 


2 


0.909 


0.091 


0.000 


3 


4.047 


1 


3 


0.585 


0.000 


0.415 






9 
z 


i 


nil'? 

U. 1 io 




n m a 






2 


2 


0.027 


0.966 


0.007 






2 


3 


1.000 


0.000 


0.000 






3 


1 


0.000 


0.000 


1.000 






3 


2 


0.000 


1.000 


0.000 






3 


3 


0.000 


0.035 


0.965 



Table 3: Estimates of the parameters o~ v (levels of volatility) and 7r Vl ^ V2jV3 (transition prob- 
abilities) under the HM-SV model with h = 2 and k = 3. 



v<2 = \. However, we have to consider that a jump from state 3 to state 1 is very rare 
and then there is no support from the data to estimate a transition probability given this 
pair of states. This confirms that the first-order model is preferable for the data at hand 
and may provide more reliable estimates. In any case, the possibility to estimate a higher 
order HM model, which is allowed by the proposed recursion, is important in order to 
have a counterpart against which comparing the more common first-order model. 



Appendix: the recursion in matrix notation 

First of all let p t be the column vector of prior probabilities p(u t \u max (t-h,i), ■ ■ ■ , u t -i) 
arranged in lexicographical order so that, for instance, with h = 2 and k = 2 we have 





= 1 


u t - 


-2 


= 


-l 


= 1 A 


p{u t 


= 2 


Uf- 


-2 




-l 


= 1) 


p(ut 


= 1 


Ut- 


-2 


= 1, Ut- 


-l 


= 2) 


p{u t 


= 2 


Ut- 


-2 


= 


-l 


= 2) 


p{u t 


= 1 


Uf- 


-2 


= 2, 


-l 


= 1) 


p{u t 


= 2 


Ut- 


-2 


= 2,ut_ 


-l 


= 1) 


p(th 


= 1 


Ut- 


-2 


= 2, w t _ 


-l 


= 2) 


XP(u t 


= 2 


Uf- 


-2 


= 2, u t - 


-i 


= 2) J 



i = 3,...,T. 
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Note that for t = 1 this is a vector of initial probabilities and that the number of elements 
of p t is k dt '°, where, in general, d t j = min(t — 1, h) + j + 1. Also let f t denote the 
column vector with elements f(yt\ut), u t = 1, ■ ■ ■ ,k, and let q t j denote the column vector 
of the posterior probabilities q(ut\u max (t-h,i)j • • • , u t-i, Ut+i, ■ ■ ■ , u t +j, y) again arranged in 
lexicographical order. With h = 1, k = 2, and j = 1, for instance, we have 





= l 


U t - 


-i 


= 1- 


U t +1 




q(u t 


= l 


u t - 


-i 


= 1- 


u t +i 


= 2,y) 




= 2 


u t - 


-i 


= 1- 


U t +1 


= 1,2/) 


q(ut 


= 2 


u t - 


-i 


= 1- 


u t +i 


= 2,y) 


q(u t 


= 1 


u t - 


-i 


= 2. 


U t +1 


= 1,2/) 


q(ut 


= 1 


u t - 


-i 


= 2. 


u t +i 


= 2,2/) 


q(ut 


= 2 


u t - 


-i 


= 2. 


U t +1 


= 1,2/) 


\q(ut 


= 2 


u t - 


-i 


= 2. 


U t +1 


= 2,2/)^ 



The dimension of this vector is k dt ' j ; note that for j = these vectors contain the target 
posterior probabilities in (4). 

Finally, let M a ,b be a marginalization matrix such that, given a column vector v with 
elements indexed by a sequence of b variables assuming k possible values (as the above 
vectors), M a ^v provides the corresponding vector in which the elements are summed 
with respect to the a-th of these variables. This matrix may be simply constructed by 
following Kronecker product M a ^ = (S'/Li where 

= / 1*, 1 = a - 

with 1^ denoting a column vector of /c ones and Ik denoting an identity matrix of the 
same dimension. For instance, in the same context of the example that led to the vector 
q t in (11), we have 



(qiu t = 


l 


U t -1 


= 1, 


u t +i 


= 1,2/)" 


f g(«t 


= 2 




= 1, 


Mi+l 


= l,2/)\ 


q(u t = 


l 


U t -1 


= 1, 


u t +i 


= 2,2/)- 


+- 


= 2 


Mt-l 


= 1- 


«t+l 


= 2,2/) 


q(u t = 


l 


U t -l 


= 2, 


u t +i 


= 1,2/)- 


f ?(«t 


= 2 


U t -l 


= 2. 


«t+l 


= 1,2/) 


\q(u t = 


l 


U t -1 


= 2, 


U t +1 


= 2,2/)- 




= 2 


Ut-1 


= 2. 




= 2,2/)/ 



Using the above notation, for t = T we directly obtain the target vector q t through 



1(3 



the following operations, which directly derive from (5): 

«T,0 = (l fc (*r,o-i) ® /t) x Pti 

<?t,o = a r/(M"^ 0jdTo M dTi0idTO a Ti0 ), 

where x and / denote, respectively, elementwise product and division. Then, for t = 
1, . . . ,T — 1 (in reverse order), we first compute q t j for j = min(T — t, h) and then 
we recursively compute q t j from j = min(T — t, h) — 1 to j = 0. In particular, for 
j = min(T — t, h) we compute the vector dtj containing the elements at the numerator 
of (7) by the following recursion: 

a*,o = (l fc K,o-D ® ft) x Pv 

a t> i = (a y _i <g> l fe ) x (l Jfe (d ti ,-d t+I>0 ) / = l,...,j. 

Then, we have 

9tj = a *,i/( M d t ,oAj M *,oAi a *j)- 
Finally, from (6) we have the recursion 

s t,j+i = (l fe ( d t,j+i- d t+j+i,o) ® <7t+j+i,o)/<Ztj+i> 

9tj = 1 jfe <J *,j/(- M 'tit,i+i.d*,j+i s *.J+l)> 

to be applied for j = min(T — t,h) — l until j = 0, when we obtain the target vector q t . 

In order to express the forward recursion in (10) using the matrix notation, let q* t de- 
note the vector with elements q(u max (t-h,i), ■ ■ ■ > u t\v) arranged in the usual lexicographical 
order. Then, we have q\ = q l , whereas for t > 1, we have 

a * = [ %Q x (9t-i ® t = 2, . . . , /i + 1, 

9 * \ q tfl x [(M li/l+1 q t *_i) ® Ifc], t = h + 2,...,T. 
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